Paris Airbnb - visualisation data project

Author: Wiktoria Ekwińska

The open-access dataset on which the analysis was conducted comes from Airbnb platform and it was downloaded via this website.

It contains the data, collected from the above mentioned platform, relating to residential homes available for short-term rental from private persons in Paris.

Goals

From the analysis we want to learn the following:

About Paris Airbnb data

Variables:

Source

Uploading the data

The dataset contains 18 columns and 49634 rows. It should be noted that for some of the variables there is missing data. We're going to get more information about the missing data.

Missing data appears in case of 6 columns. As it turns out, the column 'neighbourhood_group' doesn't contain any values so it's going to be removed. Moreover, other columns that were dropped are 'id', 'host_name' and 'license'.

We're replacing missing data in the column 'reviews_per_month' with the value 0:

Basic statistics

Firstly, we should notice the high values of standard deviation, which means there is a siginificant dispersion around the mean. Therefore the next step involves visualisation of the distributions of all numeric variables with box plots and density plots (excluding 'host_id', 'latitude' and 'longitude').

Box plots

From the box plots we can see that there are a lot of outliers.

Distribution visualisation using density plots

After visualising distributions with density plots, it can be observed that neither of them resemble normal distribution. All plots are quite similar, with long right tails.

Therefore, the next step contains removing the outliers.

Removing outliers

To remove outliers an IQR (interquartile range) method was used.

The IQR describes the middle 50% of values when ordered from lowest to highest. To find the interquartile range (IQR), first you need to find the median (middle value) of the lower and upper half of the data. These values are quartile 1 (Q1) and quartile 3 (Q3). The IQR is the difference between Q3 and Q1.

Additionally, we are going to remove the observations with the price that equals 0.

Once again we are checking the descriptive statistics for the dataset, now after removing the outliers.

After removing the outliers, the standard deviation decreased in value. A significant difference was observed for the mean price, which dropped from 130 to almost 88 euro. The average number of reviews per apartment is 12, whereas the most popular offers have 51 of them. The mean availability is almost 60 days but there are offers with 0 availability and the full availability (= fully available throughout the entire year).

Categorical data

We're going to find out what neighbourhoods are present in the dataset and with what frequency they occur:

The most offers come from Buttes-Montmartre and Popincourt.

Now, we are going to see how the neighbourhoods are distributed on the map of Paris.

Next, we want to check what types of properties are in the dataset.

In the dataset, the rentals of entire home or apartment outbalance the rest. On the second position there are private rooms and later shared rooms and lastly hotel rooms.

Data visualisation

Correlation plot

Firstly, we are going to see the correlation heat plot for all the variables.

From the plot we can see that in the dataset there aren't present any strong correlations. The only higher value are 'reviews_per_month' and 'number_of_reviews' which is justified.

The relationship between the number of reviews and its price (divided into property's types)

Despite the fact that there is no linear correlation between these attributes, we are going to see how these variables look like collated with division into property's type.

For the first type of property it is difficult to find any relationship. For private rooms from around 130 euro per night the offers have rather less reviews, not many exceeds 30. In case of shared room, there are many observations with the very low or 0 number of reviews, in majority it pertains to offers below 100 euro. For the hotels, it's evident that the prices are the highest and it can be said that the higher the price the higher number of reviews it has.

The relationship between the availability and its price (divided into property's types)

Similarly, the variables don't show the linear correlation but we are going to see how these variables look like collated with division into property's type.

In this case, also the first type of property doesn't show the relationship between two variables. In case of private rooms, from the price above around 150 euro, the availability is rather higher but there are still offers with 0 availability. Shared rooms in the medium range of prices show 0 availability whereas for the cheapest the availability is various. On the other hand, hotel rooms can be characterised as the ones with high availability, however there can be found observations with 0 availability as well.

The average prices of rentals depending its type

The prices per night in hotels are the highest, they total 170 euro on average, next there are entire homes/apartments (around 90 euro), private rooms (around 60 euro) and the cheapest ones are shared rooms with the 45 euro per night averagely.

The accurate mean values collated with the average number of reviews and availability are presented below:

The average prices of rentals depending neighbourhood they're located in

The most expensive neighbourhoods for rent are Louvre, Luxembourg and Élysée where the average price per night is 115 euro. On the other hand, the cheapest ones are Gobelins and Buttes-Chaumont where prices oscilate between 70 euro. The neighbourhood where the most offers are located in - Buttes-Montmartre is characterised by relatively low prices, on average slightly below 80 euro per night. It is understandable, as the more listings in the area the bigger the competition, meaning the prices have to be competitive.

The accurate mean values collated with the average number of reviews and availability are presented below:

Top 100 offers with the highest number of reviews

The most popular neighbourhoods, meaning the ones with the highest number of reviews, are Popincourt and Vaugirard. On the other hand, the least popular are Opéra and Palais-Bourbon.

The colours of the points represent the availability of the offers. Mostly, these colours are yellowish and orangish which is indicative of rather low and moderate availabity of these offers. It makes sense as these are the listings that are willingly reviewed hence it is expected for them to be in demand.

However, there exist popular rentals with high availability. It can be explained by the fairly higher prices of these offers (with some exceptions).

Nevertheless the most offers fit in the range 50-150 euro per night. It can be said that the lower the price the bigger the number of rentals with low availability.

The accurate values from above figure are presented below:

Top 100 offers presented on the map

The exact location of the most popular offers is presented on the map below.

The red icons mark the most popular tourist attractions in Paris. After clicking on the icon the name of the place is shown.

The most of the top offers is located in some distance from the main tourist attractions in Paris. More rentals (from the most popular group) can be found in the north area of Paris.

The most booked offers presented on the map

The highest density of the most booked offers (colour yellow and orange) is observed in the north of Paris, with the biggest number of rentals located in the outskirts. The reason for that is the price accessibility of the offers located there.

The accurate values are presented below:

Visualisation of all the offers on the map of Paris

On the map there are presented all the offers from the dataset, where a particular colour represents the price (in a given scale) and the size of dots marks the number of reviews (the bigger the dot the more reviews the offer has). Moreover, after hovering over the point the more detailed information is displayed such as the exact price, number of reviews, name of the neighbourhood, the property type and its availability.

As it could be expected, the observations located further from the city centre tend to have lower prices. The closer to the city centre the more offers with higher prices per night. The number of reviews hardly depends on localisation, the rentals with a high number of reviews can be found in the outskirts as well as in the city centre.

The advantage of this kind of visualisation is that if we have a certain localisation we're interested in, it is easy to zoom in the map and compare the offers in a given area using the information available after hovering over a particular point.

Conclusions

After conducting the analysis all targets were met.

To conclude: